Robust Cluster Analysis via Mixture Models
نویسندگان
چکیده
Finite mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster data sets. In this paper, we focus on the use of normal mixture models to cluster data sets of continuous multivariate data. As normality based methods of estimation are not robust, we review the use of t component distributions. With the t mixture model-based approach, the normal distribution for each component in the mixture model is embedded in a wider class of elliptically symmetric distributions with an additional parameter called the degrees of freedom. The advantage of the t mixture model is that, although the number of outliers needed for breakdown is almost the same as with the normal mixture model, the outliers have to be much larger. We also consider the use of the t distribution for the robust clustering of high-dimensional data via mixtures of factor analyzers. The latter enable a mixture model to be fitted to data which have high dimension relative to the number of data points to be clustered.
منابع مشابه
Mixture model of Gaussian copulas to cluster mixed-type data
A mixture model of Gaussian copulas is proposed to cluster mixed data. This approach allows to straightforwardly define simple multivariate intra-class dependency models while preserving classical distributions for the one-dimensional margins of each component in order to facilitate the model interpretation. Moreover, the intra-class dependencies are taken into account by the Gaussian copulas w...
متن کاملRobust Estimation in Gaussian Mixtures Using Multiresolution Kd-trees
For many applied problems in the context of clustering via mixture models, the estimates of the component means and covariance matrices can be affected by observations that are atypical of the components in the mixture model being fitted. In this paper, we consider for Gaussian mixtures a robust estimation procedure using multiresolution kd-trees. The method provides a fast EM-based approach to...
متن کاملConsistency, Breakdown Robustness, and Algorithms for Robust Improper Maximum Likelihood Clustering
The robust improper maximum likelihood estimator (RIMLE) is a new method for robust multivariate clustering finding approximately Gaussian clusters. It maximizes a pseudolikelihood defined by adding a component with improper constant density for accommodating outliers to a Gaussian mixture. A special case of the RIMLE is MLE for multivariate finite Gaussian mixture models. In this paper we trea...
متن کاملA robust EM clustering algorithm for Gaussian mixture models
Clustering is a useful tool for finding structure in a data set. The mixture likelihood approach to clustering is a popular clustering method, in which the EM algorithm is the most used method. However, the EM algorithm for Gaussian mixture models is quite sensitive to initial values and the number of its components needs to be given a priori. To resolve these drawbacks of the EM, we develop a ...
متن کاملRobust mixture modelling using the t distribution
Normal mixture models are being increasingly used to model the distributions of a wide variety of random phenomena and to cluster sets of continuous multivariate data. However, for a set of data containing a group or groups of observations with longer than normal tails or atypical observations, the use of normal components may unduly affect the fit of the mixture model. In this paper, we consid...
متن کامل